perf: increase the cache hit rate of group context awareness by RC-CHN · Pull Request #8226 · AstrBotDevs/AstrBot

RC-CHN · 2026-05-18T07:20:11Z

reopen of #8144 , purify commit history

Add raw_records / contexts / summaries data model per group
Add LLM summary compaction strategy alongside truncation
Add turn-based (_split_into_rounds) granularity
Add image caption integration into LTM history
Add tool_call / tool_result persistence into raw_records
Add active reply support driven by LTM state
Improve summary injection prefix with system note and delimiters
Add info-level logging for summary compaction lifecycle
Clarify default summary prompt with explicit preserve/drop rules
Add context_guard for history overflow protection in agent runner
Add internal agent history compaction in agent_sub_stages
Add comprehensive LTM unit tests and compaction test suites

Modifications / 改动点

Core: `astrbot/builtin_stars/astrbot/long_term_memory.py`

Replace max_cnt ring buffer with raw_records (deque) + _raw_cursor + contexts (append-only list). Old segments are never rebuilt.
_build_segments() converts raw chat lines into OpenAI-format context segments, handling tool calls, parallel tools, and multi-step chains.
<BOT/> markers replace [You/] to avoid nickname collisions.
on_agent_done records tool-call chains and now includes the @bot prompt in contexts so future rounds see the user's original message.
asyncio.Lock for concurrency safety; remove_session() for cleanup.

Hook wiring: `astrbot/builtin_stars/astrbot/main.py`

Swap @on_llm_response → @on_agent_done for accurate tool-chain recording.
Lazy toggle detection: false→true cleans stale state on next message.
group_icl_enable=true skips Conversation DB query (conversation=None).

Config: `astrbot/builtin_stars/astrbot/default.py`

Default context_limit_reached_strategy → "llm_compress".

Agent runner: `astrbot/core/astr_main_agent.py`

_get_compress_provider auto-falls back to the main chat provider when llm_compress_provider_id is unset, preventing silent truncation.

Tests: `tests/unit/test_long_term_memory.py`

Pure functions: extract, parse, truncate, build_segments.
Integration: round-trip lifecycle, multi-round accumulation, tool chains, persona preservation, concurrent safety.
This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果

😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
/ 如果 PR 中有新加入的功能，已经通过 Issue / 邮件等方式和作者讨论过。
👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
/ 我的更改经过了良好的测试，并已在上方提供了“验证步骤”和“运行截图”。
🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
/ 我确保没有引入新依赖库，或者引入了新依赖库的同时将其添加到 requirements.txt 和 pyproject.toml 文件相应位置。
😮 My changes do not introduce malicious code.
/ 我的更改没有引入恶意代码。

sourcery-ai

Sorry @RC-CHN, your pull request is larger than the review limit of 150000 diff characters

gemini-code-assist

Code Review

This pull request introduces a significant upgrade to the Long-Term Memory (LTM v2) system, implementing a more robust architecture for group chat context management. Key changes include the introduction of a RequestContextGuard to protect provider requests from token limits without mutating persistent history, and the implementation of dual compaction strategies (turn-based truncation and LLM-based summarization) for both group and private chats. Feedback identifies several critical issues: potential blocking of message recording due to holding locks across network calls, memory leaks from uncleaned session locks, and resource leaks from uncancelled tasks in the tool loop. Additionally, improvements were suggested for handling malformed JSON in tool arguments, preventing system prompt duplication, and ensuring consistent fallback logic for compression providers.

gemini-code-assist

Code Review

This pull request implements LongTermMemory v2, introducing sophisticated context management for group chats, including logical round splitting and dual compaction strategies (truncation and LLM-based summarization). It also adds a RequestContextGuard to ensure provider requests stay within token limits without mutating canonical history. Feedback identifies several areas for refinement: the raw_records buffer needs trimming during message handling to prevent memory exhaustion, and robust error handling should be added to tool call parsing. Furthermore, the reviewer pointed out a memory leak in the session lock dictionary, suggested optimizing string length calculations for memory checks, and noted that the compression provider fallback logic needs to be correctly implemented to match the intended design.

- Add raw_records / contexts / summaries data model per group - Add LLM summary compaction strategy alongside truncation - Add turn-based (_split_into_rounds) granularity - Add image caption integration into LTM history - Add tool_call / tool_result persistence into raw_records - Add active reply support driven by LTM state - Improve summary injection prefix with system note and delimiters - Add info-level logging for summary compaction lifecycle - Clarify default summary prompt with explicit preserve/drop rules - Add context_guard for history overflow protection in agent runner - Add internal agent history compaction in agent_sub_stages - Add comprehensive LTM unit tests and compaction test suites

…sion removal

- Treat lines starting with <T:CALL>, <T:RES, or <BOT/ as regular user messages when their respective parsers return None, instead of silently dropping them. Defensive guard against malformed internal markers.

Avoid allocating a new bytes object for every string when calculating buffer size in _trim_raw_records. Character count is sufficient for the approximate memory cap.

# Conflicts: # astrbot/builtin_stars/astrbot/main.py

w31r4 · 2026-05-23T16:28:28Z

fix conflict

w31r4

LGTM

… functionality

Dt8333 · 2026-06-02T06:43:40Z

-    fallback_providers = _get_fallback_chat_providers(
-        provider, plugin_context, config.provider_settings
-    )
-    selected_provider = _select_image_chat_provider(provider, req, fallback_providers)


#8498
这里为什么需要删掉这个参数enforce_max_turns=config.max_context_length

？怎么标到这里了

…s for llm compress, fix AftCompact debug log Three context-compaction regression fixes after AstrBotDevs#8226: 1. Restore max_context_length -> enforce_max_turns propagation so normal turn-based truncation works again. 2. Serialize ContentPart and ToolCall objects into plain dicts in _message_to_dict so llm_compress no longer fails with JSON serialization errors. 3. Print _provider_messages (compacted) instead of run_context.messages (unchanged) in AftCompact debug log; truncate long role lists to first4,...,last4 to avoid log spam. Assertions in tests are also hardened to avoid coupling to exact prompt wording.

…context compression, handle compression model modalities (#8530) * fix(context): restore turn cap, serialize content parts and tool calls for llm compress, fix AftCompact debug log Three context-compaction regression fixes after #8226: 1. Restore max_context_length -> enforce_max_turns propagation so normal turn-based truncation works again. 2. Serialize ContentPart and ToolCall objects into plain dicts in _message_to_dict so llm_compress no longer fails with JSON serialization errors. 3. Print _provider_messages (compacted) instead of run_context.messages (unchanged) in AftCompact debug log; truncate long role lists to first4,...,last4 to avoid log spam. Assertions in tests are also hardened to avoid coupling to exact prompt wording. * fix(tool_loop_agent_runner): simplify context handling by removing redundant provider messages * fix(tool_loop_agent_runner): rename context manager variables for clarity * fix: update context compression to use recent token ratio instead of fixed count * fix: enhance LLMSummaryCompressor to sanitize contexts and improve message handling * ruff format --------- Co-authored-by: Soulter <905617992@qq.com>

auto-assign Bot requested review from advent259141 and anka-afk May 18, 2026 07:20

dosubot Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label May 18, 2026

sourcery-ai Bot reviewed May 18, 2026

View reviewed changes

dosubot Bot added area:core The bug / feature is about astrbot's core, backend area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. labels May 18, 2026

gemini-code-assist Bot reviewed May 18, 2026

View reviewed changes

RC-CHN added 9 commits May 19, 2026 09:40

fix(ltm): handle malformed JSON in tool args and clean up lock on ses…

b42c1d1

…sion removal

fix(ltm): guard against duplicate system prompt note injection

3826e13

fix(ltm): fall back to user message when internal marker parsing fails

bd2500c

- Treat lines starting with <T:CALL>, <T:RES, or <BOT/ as regular user messages when their respective parsers return None, instead of silently dropping them. Defensive guard against malformed internal markers.

fix(ltm): release session lock during LLM summary generation

e10d6d3

fix(ltm): trim raw_records in handle_message to prevent unbounded growth

6aa43d6

perf(ltm): use len(s) instead of len(s.encode()) in trim loop

c1bd4ad

Avoid allocating a new bytes object for every string when calculating buffer size in _trim_raw_records. Character count is sufficient for the approximate memory cap.

feat(ltm): make user segment truncation limits configurable

8a501bd

feat(ltm): pre-fill default LTM summary prompt in config and i18n

031290f

RC-CHN force-pushed the refactor-ltm branch from e98fd16 to 031290f Compare May 19, 2026 01:40

Merge remote-tracking branch 'upstream/master' into refactor-ltm

6591fa9

# Conflicts: # astrbot/builtin_stars/astrbot/main.py

w31r4 self-assigned this May 23, 2026

w31r4 self-requested a review May 23, 2026 16:31

w31r4 approved these changes May 23, 2026

View reviewed changes

dosubot Bot added the lgtm This PR has been approved by a maintainer label May 23, 2026

w31r4 approved these changes May 23, 2026

View reviewed changes

github-actions Bot mentioned this pull request May 24, 2026

🦞 OpenClaw 生态日报 2026-05-24 ivanweng2077/big_model_radar#82

Open

RC-CHN added 4 commits May 25, 2026 09:32

refactor(ltm): hardcode internal segment/trim constants

7e94aed

refactor(ltm): unify compaction strategy with main agent runner

7a6e853

feat(ltm): add @mention weight marker for group chat messages

ad9d4ed

test: fix test failures from LTM compaction unification

3c87f95

github-actions Bot mentioned this pull request May 25, 2026

🦞 OpenClaw 生态日报 2026-05-25 ivanweng2077/big_model_radar#87

Open

chore(dashboard): remove obsolete LTM compaction i18n metadata

ffbc134

github-actions Bot mentioned this pull request May 26, 2026

🦞 OpenClaw 生态日报 2026-05-26 ivanweng2077/big_model_radar#92

Open

Soulter force-pushed the master branch 3 times, most recently from a4c4a7d to 9bd38ca Compare May 28, 2026 16:55

chore: shrink codebase

194c6c1

dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:XXL This PR changes 1000+ lines, ignoring generated files. labels May 30, 2026

feat(group-chat): implement group chat context management and related…

f564de1

… functionality

Soulter approved these changes May 30, 2026

View reviewed changes

Soulter merged commit 95d8057 into AstrBotDevs:master May 30, 2026
21 checks passed

Soulter changed the title ~~refactor(ltm): redesign long-term memory with context compaction (reopen of #8144)~~ perf: increase the cache hit rate of group context awareness May 30, 2026

Soulter mentioned this pull request May 30, 2026

feat: add group message flow context mode #8243

Closed

5 tasks

lingyun14beta mentioned this pull request May 30, 2026

Fix/ltm: isolate active reply context from long-term memory and session history #7671

Closed

5 tasks

RC-CHN deleted the refactor-ltm branch May 30, 2026 11:04

This was referenced Jun 1, 2026

[Bug] LLM压缩上下文进行压缩时报错。 #8484

Closed

[Bug] V4.25.2版本普通配置项上下文截断失效 #8498

Closed

Dt8333 reviewed Jun 2, 2026

View reviewed changes

Foolllll-J mentioned this pull request Jun 2, 2026

fix(compress): improve context compression, improve kv-cache rate of context compression, handle compression model modalities #8530

Merged

5 tasks

Soulter added a commit that referenced this pull request Jun 3, 2026

fix: fix some bugs in #8226

df6eef0

Reisenbug mentioned this pull request Jun 3, 2026

[Feature] max_context_tokens=0 默认填充模型上下文窗口，对计费类 API 不友好，建议增加防御性告警 #8556

Open

2 tasks

FFFold mentioned this pull request Jun 9, 2026

[Feature] 关于Astrbot设置里面的那些让大模型API资费猛增的"毒点" #8080

Closed

2 tasks

Uh oh!

Conversation

RC-CHN commented May 18, 2026

reopen of #8144 , purify commit history

Modifications / 改动点

Core: astrbot/builtin_stars/astrbot/long_term_memory.py

Hook wiring: astrbot/builtin_stars/astrbot/main.py

Config: astrbot/builtin_stars/astrbot/default.py

Agent runner: astrbot/core/astr_main_agent.py

Tests: tests/unit/test_long_term_memory.py

Screenshots or Test Results / 运行截图或测试结果

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

w31r4 commented May 23, 2026

Uh oh!

w31r4 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Dt8333 Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dt8333 Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Core: `astrbot/builtin_stars/astrbot/long_term_memory.py`

Hook wiring: `astrbot/builtin_stars/astrbot/main.py`

Config: `astrbot/builtin_stars/astrbot/default.py`

Agent runner: `astrbot/core/astr_main_agent.py`

Tests: `tests/unit/test_long_term_memory.py`

Dt8333 Jun 2, 2026 •

edited

Loading